Multi-frame Collaboration for Effective Endoscopic Video Polyp Detection via Spatial-Temporal Feature Transformation
نویسندگان
چکیده
Precise localization of polyp is crucial for early cancer screening in gastrointestinal endoscopy. Videos given by endoscopy bring both richer contextual information as well more challenges than still images. The camera-moving situation, instead the common camera-fixed-object-moving one, leads to significant background variation between frames. Severe internal artifacts (e.g. water flow human body, specular reflection tissues) can make quality adjacent frames vary considerately. These factors hinder a video-based model effectively aggregate features from neighborhood and give better predictions. In this paper, we present Spatial-Temporal Feature Transformation (STFT), multi-frame collaborative framework address these issues. Spatially, STFT mitigates inter-frame variations situation with feature alignment proposal-guided deformable convolutions. Temporally, proposes channel-aware attention module simultaneously estimate correlation adaptive aggregation. Empirical studies superior results demonstrate effectiveness stability our method. For example, improves image baseline FCOS \(10.6\%\) \(20.6\%\) on comprehensive F1-score task CVC-Clinic ASUMayo datasets, respectively, outperforms state-of-the-art method \(3.6\%\) \(8.0\%\), respectively. Code available at https://github.com/lingyunwu14/STFT.
منابع مشابه
Polyp Detection in Endoscopic Video Using SVMs
Colon cancer is one of the most common cancers in developed countries. Most of these cancers start with a polyp. Polyps are easily detected by physicians. Our goal is to mimic this detection ability so that endoscopic videos can be pre-scanned with our algorithm before the physician analyses them. The method will indicate which part of the video needs attention (polyps were detected there) and ...
متن کاملDeep Spatial-Temporal Joint Feature Representation for Video Object Detection
With the development of deep neural networks, many object detection frameworks have shown great success in the fields of smart surveillance, self-driving cars, and facial recognition. However, the data sources are usually videos, and the object detection frameworks are mostly established on still images and only use the spatial information, which means that the feature consistency cannot be ens...
متن کاملtight frame approximation for multi-frames and super-frames
در این پایان نامه یک مولد برای چند قاب یا ابر قاب تولید شده تحت عمل نمایش یکانی تصویر برای گروه های شمارش پذیر گسسته بررسی خواهد شد. مثال هایی از این قاب ها چند قاب های گابور، ابرقاب های گابور و قاب هایی برای زیرفضاهای انتقال پایاست. نشان می دهیم که مولد چند قاب تنک نرمال شده (ابرقاب) یکتا وجود دارد به طوری که مینیمم فاصله را از ان دارد. همچنین مسایل مشابه برای قاب های دوگان مطرح شده و برخی ...
15 صفحه اولA spatial-temporal approach for video caption detection and recognition
We present a video caption detection and recognition system based on a fuzzy-clustering neural network (FCNN) classifier. Using a novel caption-transition detection scheme we locate both spatial and temporal positions of video captions with high precision and efficiency. Then employing several new character segmentation and binarization techniques, we improve the Chinese video-caption recogniti...
متن کاملSpatial-Temporal Memory Networks for Video Object Detection
We introduce Spatial-Temporal Memory Networks (STMN) for video object detection. At its core, we propose a novel Spatial-Temporal Memory module (STMM) as the recurrent computation unit to model long-term temporal appearance and motion dynamics. The STMM’s design enables the integration of ImageNet pre-trained backbone CNN weights for both the feature stack as well as the prediction head, which ...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: Lecture Notes in Computer Science
سال: 2021
ISSN: ['1611-3349', '0302-9743']
DOI: https://doi.org/10.1007/978-3-030-87240-3_29